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In  1932  a  posthumously  public had  article  by  the  Cambridge  philo¬ 
sopher  w.E.  Johnson  showed  how  symmetric  Dirichlet  priors  for  infinitely 
exchangeable  multinomial  sequences  could  be  characterized  by  a  simple 

2 

property  termed  "Johnson's  sufficiency  postulate"  by  I.J.  Good  (1965). 
Johnson  could  prove  such  a  result,  prior  to  the  appearance  of  de  Flnettl's 
work  on  exchangeability  and  the  representation  theorem, for  Johnson  had 
himself  already  Invented  the  concept  of  exchangeability,  dubbed  by  him 
the  "permutation  postulate"  (see  Johnson  (1924,  p.  183)).  Johnson's  con¬ 
tributions  were  largely  overlooked  by  philosophers  and  statisticians  alike 
until  the  publication  of  Good's  1965  monograph,  which  discussed  and  made 
serious  use  of  Johnson's  result. 

Due  perhaps  In  part  to  the  posthumous  nature  of  its  publication, 
Johnson's  proof  was  only  sketched  and  contains  several  gaps  and  ambi¬ 
guities;  the  major  purpose  of  this  paper  Is  to  present  a  complete  version 
of  Johnson's  proof.  This  seems  of  Interest  both  because  of  the  result's 
Intrinsic  Importance  for  Bayesian  statistics  and  because  the  proof  Itself 
Is  a  simple  and  elegant  argument  which  requires  little  technical  apparatus. 
Furthermore,  it  can  be  easily  generalized  to  characterize  both  asymmetric 
Dirichlet  priors  and  finitely  exchangeable  sequences  with  posterior  expec¬ 
tation  of  success  linear  in  the  frequency  count,  and  the  proof  below  is 
given  in  this  generality. 

After  sketching  the  background  to  Johnson's  result  in  Section  1, 
the  generalization  of  his  proof  mentioned  above  is  given  is  Section  2. 
Section  3  discusses  a  number  of  complements  to  the  result  and  some  open 
problems  it  raises,  and  Section  4  concludes  with  a  historical  note  on 
Johnson  and  the  reception  of  his  work  in  the  philosophical  literature. 
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X*  Thg  Bayesian  Background.  Let  X^,...  be  an  Infinite  exchange- 
able  sequence  of  0's  end  l’s  (to  be  thought  of  as  indicators  of  some 
event  E),  sad  let  +  ...+  Xjj  .  Then,  as  first  shown  by  de  Flnettl, 

it  follows  from  exchangeability  that  the  limiting  frequency 

t1*1)  Z  -s  lim  S Jn 

H  •  r 

exists  almost  surely,  and  that 

(1-2)  P{SN-k}  -  (J)  |  pk(l-p)N-k  dF(p) 

for  every  N  >  1  and  0  <  k  £  N,  where  F(p)  -  P{Z  £  p}  is  the  cumnla- 
tive  distribution  function  of  Z.  If  the  parameter  p  is  thought  of  as  a  pro¬ 
pensity  or  "objective  probability,"  then  dF  may  be  regarded  as  the  degree 
of  belief  about  or  "subjective  probability"  of  the  true  value  of  p. 

Traditionally,  the  "flat"  prior  dF(p)  •  dp  was  taken  to  express 
"complete  ignorance"  about  p,  or  the  likelihood  of  the  event  E  (for 
which  the  Xt  serve  as  indicators).  Bayes's  own  justification  for  this 
was  to  take  P(SN  •  k}  ■  (N+1)  X  as  quantifying  complete  ignorance 
about  E,  observe  that  (1.2)  gave  precisely  this  result  (for  all  k 
and  N)  when  dF(p)  •  dp,  and  then  conclude  chat  dF(p)  is  dp.3 
Laplace  justified  the  choice  somewhat  more  directly  by  invoking  the 
so-called  principle  of  insufficient  reason. 


This  principle  cane  under  strong  critic lsn  during  the  letter  pert  of 
the  19th  century,*  end  sooe  advocates  of  its  use  (Edgeworth  (1884,  p.  230), 
Pearson  (1907))  adopted  the  position  that  taking  dP(p)  ■  dp  was  often 
approximately  justifiable  on  the  basis  of  experience  and  background  Infor¬ 
mation;  a  position  which  suggests  that  other  priors  might  equally  well 
express  and  quantify  states  of  knowledge  previous  to  the  receipt  of 
sampling  data.  It  was  against  this  background  that  the  actuary  6.F.  Hardy 
(1889)  and  the  mathematician  W.A.  Whitworth  (1897,  pp.  224-225)  both 
proposed  the  class  of  beta  priors 

b(o,b)  "  TwfeT  p0"1^)*"1  .  «.B  >  0  , 

as  suitable  for  quantification  of  prior  knowledge. 

In  1778  Laplace  proposed  the  obvious  multinomial  generalization  of  the 
Bayes-Laplaee  prior  (Laplace  (1781,  f33);  cf.  De  Morgan  (1845,  §48-49), 
Bacheller  (1912,  p.  503),  Lldstone  (1920)):  if  are  the  out¬ 

comes  of  a  t -category  multinomial  with  unknown  sampling  vector  p  ■  (p,,. . p 
and  frequency  counts  n  -  (n^.n^, . • • ,nt> ,  then 

(1.3)  P(n,,...,n  }  ■  5“T  f  S  p^  dP(p) 

1  C  "V  Ep'-l  i-1  1 

with  dF(p)  •  dp^dpj  •••  dpt_^,  which  implies  that 

n  +1 

(1.4)  P{X^i  e  i-th  category  |  n}  -  . 

In  1924  W.E.  Johnson  gave  a  justification  for  (1.4)  parallel  to 
Bayes’s:  if  all  ordered  t-partitlons  n^  +  n^  +  •  •  •  +  n£  of  N  are 


uauMd  to  b«  a  priori  equally  likely,  than  (1.4)  oust  hold;  ie  follows 
(as  observed  by  Good  (1965,  p.  25))  that  the  moments  of  dF,  and  heacs 
df  itself,  ere  uniquely  determined. 

It  was  against  this  background  that  Johnson,  not  entirely  satisfied 
with  his  equlprobablllty  (or  "combination”)  postulate,  proposed  another, 
more  general  one  (his  "suff ielantnees"  postulate) ,  which  had  the  conse¬ 
quence  of  forcing  dF  to  be  a  member  of  the  Dlrichlet  family 


(1.5) 


Dlr(k^, 


V1 


dpi 


•  dpt-l 


(k4  >  0,  an  i) 


2 


be  a  sequence 


.  Finite  Exchangeable  Sequences.  Let 
of  randos  variables,  each  taking  values  In  the  set  t  “  {l,2,...,t}, 

mm 

N  >  1  and  t  £  »,  such  that 

(2.1)  PfV1! . =WrV>0-  all(l1,..,,JWil)  ctW. 

, . . .  denote  the  t-veetor  of  frequency  counts,  l.e. 

n-  (n^n^...,^),  where  nt  -  0^2^,...,^)  -  #{Xj-i}.  Johnson's 
sufflcidntness  postulate  assumes  that 

(2*2)  •  ft(nt)  , 

that  Is,  the  conditional  probability  of  an  outcome  In  the  1-th  cell  given 
X^,...,X^  only  depends  on  n^,  the  number  of  outcomes  In  that  cell 
previously.  (Hots  that  (.2.2)  is  well-defined  because  of  (2.1).)  if 

*1  •  *  *  • » ^jf+l  14  exchangeable,  fi(®l)"  ^S+r11?1  ’  P^XlW■l"1lni^• 

LB1A.  2.1:  If  t  >  2  and  (2.1),  (2.2)  bold,  then  there  exist  constants 
a^  >  0  and  b  such  that  for  all  1, 

(2.3)  fjCn^  "  *i  +  bni  * 

Proof:  First  assume  N  ^  2.  Let 

*  (°j . nj,  •  •  •  ,Oj  , . .  .  ,n^, .  ..  .n^) 

be  a  fixed  ordered  partition  of  N,  with  i,j,k  three  fixed  distinct 
Indices  such  that  0  <  n^.n^  and  n^n^  <  N»  end  let 
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Let  n  “  n(X^ 


.  V  V  V 

■  *  I  Vl 


Note  that  for  any  n 


(2.4) 

hence  taking 

(2.5) 

Thus 

where  at  •:  i 
pendent  of  1 
If  N-l 
J  i  1.  ‘j+Cj  ' 
Let  A  - 

(2.6) 

hence  A  <  ® 


I  W  ' 1  • 

nacn 

i  -  w*  obtaia 

fi(ni+l)  -  f^)  -  f  j  (n^  )  -  fjCnj-D 

-  f k(n|+l)  -  f^n^) 

-  f  i(n1)  -  f1(n1-l)  . 

fl(n±)  -  •i  +  l»n1  . 

t(0)  >  0,  (because  of  (2.1)),  and  b  Aft(n±)  is  inde- 
(because  of  (2.5)). 

let  ci  -  fi(l);  It  then  follows  froei  (2.4)  that  for  any 

'  a^+c^  bence  c^-^  -  cj“aj  ”  b*  i 

E  a..  It  follows  from  (2.3),  (2.4)  that 
i  * 

A  +  bN  -  1  , 


and 


(2.7) 


b  -  (1-A)/N 


Suppose  b  +  0.  Then  lectlng  k  "  a^/b  and  K  ■  2  k^,  we 
see  from  (2.6)  that 


b_1-N  +  A/b  -  N  +  K  , 


hence 


ki  +  ni  nl  +  ki 

fi(ni)  “  *i+tai  “  b~l  N+K 


EXAMPLE  2.1  (Sampling  without  replacement.)  Let 
denote  a  random  sample  drawn  from  a  finite  population  with  i  1 
members  In  each  category  1.  Let  M  ■  m^+  •  •  •  +  and  let  N  <_  m^, 
all  1.  Then 


(2.8) 


P(XN+i  e  category  i|n} 


Vni 


Thus  a^  ■  m^/(M-H)  and  b  ■  (N-M)  *  <  0.  Note  that  k^  "  -m^;  thus 
k^  (and  hence  R)  Is  Independent  of.  N,  although  a^,  A.,  and  b  are 
not.  The  next  lemma  states  that  this  is  always  the  case  If,  as  here, 
the  X^  are  exchangeable  and  b  4  0. 

Let  b^,k^*\  and  fj^Cn^.N)  denote  the  dependence  of  a^,  b, 

k^,  and  f^(ttj)  on  N*  Thus,  If  (2.1)  and  (2.2)  are  satisfied  for  a  fixed 
N  ^  1,  then  there  exist  a^  and  b^  such  that  for  all  1, 
ff^i.N)  ■  a^  +  b^n^.  Note  that  b^  *0  If  and  only  if 
(X^, . . . , X^}  and  X^^  are  Independent. 


LEMMA  2.2:  Let  X^,^, . . .  .X^,  ,Xn+2  be  an  “changeable  sequence  of 
t -valued  random  variables,  N  >  1  and  t  2,  satisfying  (2.1)  and 


(2.3)  for  both  N+l  and  N+2. 

(i)  If  bW  •  b(N+1)  -  0,  then  b*N)  -  b(N+1)  ■  0. 


(ii)  If  b(N)  •  b(N+1)  i  0,  then  b(N)  •  b<M+1)  >  0 
and  k£N)  -  k^N+1) ,  all  i. 


Proof:  (i)  Choose  and  fix  two  distinct  indices  i  ^  J.  Let  a^a^  , 
*  a(*M-)^  b  •  b^ ,  b*  ■  b^IH’^^,  etc.  Suppose  b  ■  0.  It  follows 
from  exchangeability  that  for  any  partition  n  of  N, 


(2.9)  t{Xwl.l,X1H.2.)|;)  . 
hence 

(2.10)  (ai)(aj+b,nj)  -  (a^)  (a^  +  b'n^  . 


First  taking  n  in  (2.10)  with  n^  *  0,  n^  ■  N,  then 
with  n^  •  N,  n^  ■  0  and  subtracting,  we  obtain  Sjb'N  ■  -ajb'N, 
hence  b'  •  0  (since  a^,,  a^  >  0).  Similarly,  if  b'  •  0  then  b  -  0. 

(ii)  Suppose  b  •  b'  +  0.  Then  it  follows  from  (2.9)  that  for  any 
partition  a  of  N, 

mm 


(2.11) 


ni+ki  Vki 

v  N+K  N+l+K' 


n  +k.  n  +k£ 

/  i  ])  /lav 

'  N+K  1  'N+l+K’'’  ’ 


hence 
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(2.12)  Vj  +  kj'"l  +  klki  *  ki"l  +  ki0l  + “ft  ’ 

Letting  n^  *  0,  n^  *  N  in  (2.12),  then  ni  -  N,  n^  ■  0  and  sub¬ 
tracting,  we  obtain  1^  +  kj  -  k^  +  kj ;  since  i  and  j  were  arbitrary, 

this  implies  K  •  K'  and,  if  t  >  2,  k±  -  k^  for  all  i.  Since 

a^,  a^  >  0,  clearly  b  and  b*  must  have  the  same  sign. 

Suppose  t  »  2  (so  that  i  ■  1,  j  ■  2,  say,  and  K  “  k^+k^) . 

Taking  n±  *  0,  n^  -  N  in  (2.12),  we  obtain  k^CN+kp  -  lc^OM-k^), 

hence 

k^N+K-k^)  -  ^(N+K-kj) 

from  which  it  follows  (since  M+K  »  b"1  4  0)  that  ^  -  k^,  hence 
kj  -  kj  .  B 

Together,  Lemnas  2.1  and  2.2  inmed  lately  Imply 

THEOREM  2.1;  Let  X^Xj,...^  (Nq*.1)  80  exchangeable  sequence  of 

t-valued  random  variables  such  that  for  every  N  <  Nq,  (i)  (2.1)  holds, 
(ii)  (2.2)  holds  if  t  >  2  or  (2.3)  holds  if  t  -  2.  If  the  {X^}  are  not 
Independent  («  b^  4  0),  then  there  exist  constants  k^  j  0,  either 
all  positive  or  all  negative,  such  that  N  +  Z  k^  i  0  and 

ni+ki 

(2.13)  P{xm-r1l~}  “ 

for  every  N  <  Nq,  partition  n  of  N,  and  let. 

COROLLARY  2.1;  If  X1,X2,X3,...  is  an  infinitely  exchangeable  sequence 
which  for  every  N  ^  1,  satisfies  both  (i)  (2.1),  and  (ii)  either  (2.2), 
if  t  >  2  or  (2.3),  if  t  -  2,  then  b(1)  >  0. 
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1 


>  m  immlCm 


Proof.  Suppose  <  0.  But  then  1WC  -  1/b^  <  0  for  all  N, 

which  is  clearly  impossible.  | 

COROLLARY  2.2:  For  all  N  <  Ng,  under  the  conditions  of  Theorem  2.1, 


Proof.  It  follows  from  the  product  rule  for  conditional  probabilities 
that  it  suffices  to  prove  P(X^*i}  ■  k^/K  for  all  let.  But 


(2.15)  PfXj-i.Xj-j)  -  (aj15 +b(1)5j(i))P{X1- i) 


where  6^(1)  is  the  indicator  function  of  (i-j).  Summing  over  i 
in  (2.15)  gives  P(X2-j}  -  aj1^  +b^p{X1»  j),  hence  by  exchange¬ 
ability  Pf^-j}  -  aj^/(l-b^)  -  kj/K,  since  aj1^  -  k^1*, 
l-b(1)  -  A(1)  (cf.  (2.6)),  and  K-  A(1)/b(1).  g 

It  follows  from  Corollary  2.2  that  {k^-  let)  uniquely  determines 
P  ■  X(XlfX2, . . .  .Xjj  ).  Conversely,  for  every  summable  sequence  of 
constants  {k^},  all  of  the  same  sign,  there  exists  a  maximal  sequence 
of  t-valued  random  variables  X^,X2,...,XN  (Ng  <  ®)  such  that  (2.1) 
and  (2.13)  hold.  The  length  of  this  sequence  is  determined  by  N*,  the 
largest  value  of  N  such  that 


determines  a  positive  probability  measure  on  t,  !.*•>  Nq ■  N  +1,  where 


(1) 

if 

ki> 

0, 

all  1, 

and 

Z  kt  <  •,  then  N*  -  ®,  or 

(11) 

if 

ki< 

0, 

all  1, 

and 

EjkJ  <  *,  then 

N*  »  max{N  >  0:  N+k^  <  0,  all  i)  . 

Thus,  if  K  <  0,  N*  •  [min{Ikt(sie  t}],  where  [•]  denotes  the 
Integer  part  of  a  number.  Hence,  If  Nq  >  1,  then  t  <  ®  (since 
£|k  |  <  ®  Implies  ^  -*■  0). 


I  *  .  •*.•••■.  ■  •  . 

»-  ••  v  \0/.-.  .vV 


3.  Complements  and  Extensions. 

3.1.  The  Symmetric  Dlrichlet.  Johnson  considered  the  special  case  where 
(1)  f f  is  Independent  of  1,  i.e~,  for  each  N,  there  exists  a 
single  function  f  such  that 


(3.1)  PCX^i"  i|n)  -  f(nltN)  for  all  1  ; 

(11)  b  is  positive.^ 

Under  these  conditions  t  <  »,  a^  =  a,  k^  =  k  >  0,  P^*  1}  ■ 

n  +k 

(3.2)  P{XIffl-i|n}  ■  jj+gr  , 

and  X^,...,Xjg  can  be  extended  to  an  Infinitely  exchangeable  sequence, 
whose  mixing  measure  dF  in  the  de  Flnetti  representation  is  the  synmetrlc 
Dlrichlet  distribution  with  parameter  k.* 

3.2.  Alternate  approaches.  Let  &c  be  the  probability  simplex 
(p^O,  i«l,...,t:  Ip^-l).  Doksua  (1974,  Corollary  2.1)  states  in  the 

present  setting  that  a  probability  measure  dF  on  has  a  posterior 

distribution  dF(pi|x1, . . . .X^),  which  depends  on  the  sample  only  through 
the  values  of  n^  and  N,  if  and  only  if  dF  is  Dlrichlet  or 

(i)  dF  is  degenerate  at  a  point  (i.e.,  X^,X2».««  is  independent); 

(ii)  dF  concentrates  on  a  random  point  (i.e.,  dF  is  supported  on 
the  extreme  points  (6^(j):  i*l,...»t}  of  At,  so  that  (2.1)  would  not 
hold); 

(ill)  dF  concentrates  on  two  nonrandom  points  (i.e.,  t  ■  2  or 
can  be  taken  to  be  so) . 
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V  •* 


This  is  s  slightly  weaker  result  than  Johnson's,  which  only  makes 


the  corresponding  assumption  about  the  posterior 


pec tat ion s  of  the 


Dlaconls  and  Tlvisaker  (1979,  pp.  279-280)  prove  (using  Eric  son's 
theorem  (1969,  p.  323))  that  the  beta  family  is  the  unique  one  allowing 
linear  posterior  expectation  of  success  in  exchangeable  binomial  sampling, 
i.e. ,  t  ■  2  and  CXQ}  infinitely  exchangeeble,  and  remark  that  their 
method  may  be  extended  to  similarly  characterize  the  Dlrlchlet  priors  in 
multinomial  sampling.  Eric son's  results  can  even  be  applied  in  the 

finitely  exchangeable  case  and  permit  the  derivation  of  alternate  expres¬ 
sions  for  the  coefficients  a^  and  b  of  (2.3) 


3.3.  When  is  Johnson's  postulate  Inadequate!  In  practical  applications 
Johnson's  sufflclentncss  postulate,  like  exchangeability,  may  or  may  not 
be  an  adequate  description  of  our  state  of  knowledge.  Johnson  himself 
did  not  view  his  postulate  as  universe! y  applicable: 

the  postulate  adopted  in  a  controversial  kind  of  theorem 
cannot  be  generalized  to  cover  all  sorts  of  working  problems; 
so  it  is  the  logician’s  business,  having  once  formulated  a 
specific  postulate,  to  indicate  very  carefully  the  factual 
and  eplstemic  conditions  under  which  it  has  practical  value. 
[Johnson  (1932,  pp.  418-419)}. 

Jeffreys  (1939,  §3.23)  briefly  discusses  when  such  conditions  may  hold. 
Good  (1953,  p.  241;  1965,  pp.  26-27)  remarks  that  the  use  of  Johnson's 
postulate  fails  to  take  advantage  of  information  contained  in  the  "fre¬ 
quencies  of  frequencies"  (often  useful  in  sampling  of  species  problems), 
and  elsewhere  (Good,  1967)  advocates  mixtures  of  symmetric  Dirlchlets  as 
frequently  providing  more  satisfactory  initial  distributions  in  practice. 


3.4.  Partition  exchangee bility.  If  the  cylinder  aata  (s^l^, . . . 
ara  Identified  with  the  function*  g:(l,...,N}  -*■  {l,...,t},  then  the 
exchangeable  probability  a ea suras  p  are  precisely  thoae  P  such 
that 

?{g  •  if)  ■  p{g} 

for  all  g  and  all  permutations  ir  of  g  ■  fl,2,...,N}.  Equivalently, 
the  exchangeable  P’s  ara  those  such  that  the  frequencies  n  are  sufficient 

I 

statistics  with  p{*[n}  uniform. 

The  rationale  for  exchangeability  is  the  assumption  that  the  Index 
set  N  conveys  no  Information  other  than  serving  to  distinguish  one 
element  of  a  sample  from  another.  In  the  situation  envisaged  by  Johnson, 
Carnap  (see  Section  4  below),  and  others,  a  similar  state  of  knowledge 
obtains  vis-a-vis  the  Index  set  t  (think  of  the  categories  as  colors) . 

Then  it  would  be  reasonable  to  require  of  P  that 

P{ir2  •  g  •  ffj}  ■  P(gJ 

for  all  functions  g:  N  ■*  t,  and  permutations  ir,  of  N,  tt,  of  t. 

Call  such  P’s  partition-exchangeable.  The  motivation  for  the  name  Is 
the  following.  Let  a(n)  “  {aT:  0  <_  r  <_  N}  denote  the  frequencies  of 
the  frequencies  n,  i.e. ,  ay  -  #{n^  ■  r}.  Then  P  Is  par tit Ion -exchange¬ 
able  If  and  only  If  the  ar  are  sufficient  with  P{*|a(n)}  uniform,  i.e. 
Pfgj^  •  P(g2}  whenever  •(°(S^))  *  a(n(g2)).  The  set  of  partition- 
exchangeable  probabilities  is  a  convex  set  containing  the  synmetrlc 
Dlrlchlets.  From  this  perspective  the  frequencies  of  frequencies  emerge 


as  maximally  Informative  a CaCiaClcs  sad  Cha  mixtures  of  symmetric 
Dlrlchlets  as  partition-exchangeable. 

It  mould  be  of  Interest  to  have  extensions  of  Johnson's  results  to 
"representative  functions"  of  the  functional  form  f  ■  f(n, ,a(n));  for 

1  m  • 

partial  results  in  this  direction  (f  ■  f(n^,*Q)),  see  Blntlkka  and 
Nllniluoto  (1976),  Kulpers  (1978).  It  would  also  be  of  Interest  to 
have  Johnson  type  results  for  Markov  exchangeable  and  other  classes  of 
partially  exchangeable  sequences  of  random  variables;  cf.  Dlaeonla  and 
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Freedman  (1980)  for  the  definition  and  further  references;  Nllnlluoto 
(1980)  for  an  Initial  attempt. 


4.  Historical  Kota.  Johnson's  results  appear  to  have  attracted  little 
interest  during  his  lifetlae.  C.D.  Broad,  in  his  review  of  Johnson's 
Logic  (vol.  3,  1924),  while  favorable  in  his  overall  assessment  of  the 
book,  was  highly  critical  of  the  appendix  on  "eduction"  (in  which  Johnson 
Introduced  the  concept  of  exchangeability  and  characterised  the  multinomial 
generalisation  of  the  Bayes -La place  prior!) :  "About  the  Appendix  all  1  can 
do  is,  with  the  utaost  respect  to  Mr.  Johnson,  to  parody  Mr.  Hobbes's 
resHirk  about  the  treatises  of  Milton  and  Salaasiius:  'Very  good  mathema¬ 
tics;  I  have  rarely  seen  better.  And  very  bed  probability;  I  have  rarely 
seen  worse.'"  (Broad  (1924;  p.  379);  see  generally  pp.  377-379.]  Other 
than  this,  two  of  the  few  references  to  Johnson's  work  on  the  multinomial, 
prior  to  Good  (1965),  are  passing  cooaents  in  Harold  Jeffreys's  Theory 
of  Probability  (1939,  S3. 23),  and  Good  (1953,  pp.  238*241).  This  general 
neglect  is  all  the  more  surprising,  inasmuch  as  Johnson  could  count  among 
his  students  Keynes,  Ramsey,  and  Dorothy  Wrlnch  (one  of  Jeffreys's  colla¬ 
borators)  .  ^ 

Xt  is  ironical  that  in  the  decades  after  Johnson's  death,  Rudolph 
Carnap  and  his  students  would,  unknowingly,  reproduce  such  of  Johnson's 
work.  In  1945  Carnap  introduced  the  function  c*  [-  *  i|n}] 

and  proved  that  it  had  to  have  the  form  (1.4)  under  the  assumption  that 
all  "structure-descriptions"  [■  partitions  n]  were  a  priori  equally 

Q 

likely.  And  just  as  Johnson  grew  uneasy  with  his  combination  postulate, 
so  coo  Carnap  would  later  introduce  the  family  of  functions  (c^:  0  <  \  <  »} 
(■(nj+k)/H+kt,  X  corresponding  to  our  k],  the  so-called  "continuum  of 
inductive  methods"  (Carnap  (1952)).  But  while  Johnson  proved  that  (3.2) 
followed  from  the  sufficientness  postulate  (3.1),  Carnap  initially  assumed 


both,  although  his  collaborator  John  G.  Kamany  was  soon  aftar  able  to  show 
thair  equivalence  for  t  >  2.  Subsequently  Carnap  generalized  thasa 
results,  first  proving  (3.2)  follows  fro.  the  linearity  assumption  (2.3) 
when  t-2  (Carnap  and  Stegmiller  (1959) ,  and  later,  in  his  last  and 
posthumously  published  work  on  the  subject,  dropping  the  equiprobability 
assumption  (3.1)  in  favor  of  (2.2)  (Carnap  (1980,  §19);  cf.  Kuipers 
(1978)). 9 

For  details  of  Johnson's  life,  see  Broad  (1931),  Braithwaite  (1949); 
for  assessments  of  his  philosophical  work,  Passmore  (1968,  pp.  135-136, 
343-346),  Smokier  (1967),  Prior  (1967,  p.  551).  In  addition  to  his  work 
in  philosophy,  Johnson  wrote  several  papers  on  economics  one  of  which, 
on  utility  theory,  is  of  considerable  importance;  all  are  reprinted,  with 
brief  commentary,  in  Baumol  and  Goldfeld  (1968) . 


Acknowledgment:  I  am  grateful  to  Persi  Diacouis  and  Stephen  Stigler  for 
a  number  of  helpful  comments  and  references. 
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1.  Thla  research  ms  supported  by  Office  of  Naval  Research  Contract 
N00014-7 6-C-047  5  (NR-042-267) . 

2.  Good  (1967)  later  shifted  to  the  ten  "eufflclentness"  to  avoid 
confusion  with  the  usual  statistical  meaning  of  sufficiency. 

3.  The  arguaent  can  be  aade  rigorous  by  noting  that  dF  is  uniquely 
determined  by  its  moments;  see,  e.g.,  Murray  (1930);  Edmrds  (1974,1978) 
Stlglar  (1981)  traces  how  Bayes's  argument  ms  systematically  distorted 
by  later  statistic lane  to  fit  their  own  foundational  preconceptions. 

4.  Most  notably  by  Boole,  Venn,  end  Chrystal.  Unfortunately,  Fisher's 
account  (1936,  Chapter  2)  of  their  reservations  is  seriously  flawed;  see 
Zabell  (1982). 

5.  This  is  the  major  gap  in  Johnson's  proof.  If  {Xj.Xj,...}  is  infi¬ 
nitely  exchangeable,  but  not  independent,  the  assumption  that  b  is 
positive  is  superfluous  (see  Corollary  2.1  above). 

6.  Good  (1965,  p.  25)  suggests  that  Johnson  ms  "unaware  of  the  connec 
tion  between  the  use  of  a  flattening  constant  k  and  the  syametrical 
Dlrichlet  distribution."  However,  Johnson  ms  at  least  amre  of  the 
connection  when  k  •  1,  for  he  wrote  of  his  derivation  of  (1.4)  via 
the  combination  postulate, 

...I  substitute  for  the  mathematician's  use  of  C earns 
functions  and  the  o-multlple  integrals,  a  comparatively 
simple  piece  of  algebra,  and  thus  deduce  a  formula  similar 
to  the  mathematician's,  except  that  instead  of  for  two,  my 
theorem  holds  for  a  alternatives,  primarily  postulated  as 
equiprobable.  [Johnson  (1932,  p.  418);  Johnson's  a  corres¬ 
ponds  to  our  t .  ] 
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7.  For  Kaunas's  particular  ladabtadnass  to  Johnson,  saa  tha  former' a 


Treatise  on  Probability  (1921),  pp.  U  (fn.  1),  68-70,  116,  124  (fn.  2), 
150-144;  cf .  Broad  (1922),  pp.  72,  78-79,  Passaora  (1968),  pp.  345-346.) 

8.  Carnap  (1945);  cf.  Carnap  (1950),  Appendix. 

9.  For  tha  historical  svolution  of  this  aspact  of  Carnap's  work,  saa 
Schllpp  (1963),  pp.  74-75,  979-980;  Carnap  and  J affray  (1971),  pp.  1-4, 
223;  Jaffray  (1980),  pp.  1-5,  103-104. 
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